YGGDRASIL-A statistical package for learning Split Models
نویسنده
چکیده
There are two main objectives of this paper. The first is to present a statistical framework for models with context specific indepen dence structures, i.e. conditional independen cies holding only for specific values of the con ditioning variables. This framework is consti tuted by the class of split models. Split mod els are an extension of graphical models for contingency tables and allow for a more so phisticated modelling than graphical models. The treatment of split models include estima tion, representation and a Markov property for reading off those independencies holding in a specific context. The second objective is to present a software package named YG GDRASIL which is designed for statistical in ference in split models, i.e. for learning such models on the basis of data.
منابع مشابه
Yggdrasil: An Optimized System for Training Deep Decision Trees at Scale
Deep distributed decision trees and tree ensembles have grown in importance due to the need to model increasingly large datasets. However, PLANET, the standard distributed tree learning algorithm implemented in systems such as XGBOOST and Spark MLLIB, scales poorly as data dimensionality and tree depths grow. We present YGGDRASIL, a new distributed tree learning method that outperforms existing...
متن کاملSplitting It Up: The spduration Split-Population Duration Regression Package for Time-Varying Covariates
We present an implementation of split-population duration regression in the spduration (Beger et al., 2017) package for R that allows for time-varying covariates. The statistical model accounts for units that are immune to a certain outcome and are not part of the duration process the researcher is primarily interested in. We provide insights for when immune units exist, that can significantly ...
متن کاملIntroduction Package CircOutlier For Detection of Outliers in Circular-Circular Regression
One of the most important problem in any statistical analysis is the existence of unexpected observations. Some observations are not a part of the study and are known as outliers. Studies have shown that the outliers affect to the performance of statistical standard methods in models and predictions. The point of this work is to provide a couple of statistical package in R software to identi...
متن کاملPush-Button Verification of File Systems via Crash Refinement
The file system is an essential operating system component for persisting data on storage devices. Writing bug-free file systems is non-trivial, as they must correctly implement and maintain complex on-disk data structures even in the presence of system crashes and reorderings of disk operations. This paper presents Yggdrasil, a toolkit for writing file systems with push-button verification: Yg...
متن کاملBoosting Algorithms: Regularization, Prediction and Model Fitting
We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and corresponding Akaike or Bayesian information criteria, particularly useful for regularization and variable selectio...
متن کامل